7 research outputs found

    Maximum likelihood (ML) estimators for scaled mutation parameters with a strand symmetric mutation model in equilibrium

    Get PDF
    With the multiallelic parent-independent mutation-drift model, the equilibrium proportions of alleles are known to be Dirichlet distributed. A special case is the biallelic model, in which the proportions are beta distributed. A sample taken from these models is then Dirichlet-multinomially or beta-binomially distributed, respectively. Maximum likelihood (ML) estimators for the mutation parameters of the biallelic parent-independent mutation model are available via an expectation maximization algorithm. Assuming small scaled mutation rates, the distribution of a sample of size MM can be expanded in a Taylor series of first order. Then the ML estimators for the two parameters in the biallelic model can be expressed using the site frequency spectrum. In this article, we go beyond parent-independent mutation and analyse a strand-symmetric mutation model with six scaled mutation parameters that deviates from parent independent mutation and, generally, from detailed balance. We derive ML estimators for these six parameters assuming mutation-drift equilibrium and small scaled mutation rates. This is the first time that ML estimators are provided for a mutation model more complex than parent-independent mutation

    The expected sample allele frequencies from populations of changing size via orthogonal polynomials

    Get PDF
    Funding: CV’s research was supported by the Austrian Science Fund (FWF): DK W1225-B20; LCM’s by the School of Biology at the University of St. Andrews.In this article, discrete and stochastic changes in (effective) population size are incorporated into the spectral representation of a biallelic diffusion process for drift and small mutation rates. A forward algorithm inspired by Hidden-Markov-Model (HMM) literature is used to compute exact sample allele frequency spectra for three demographic scenarios: single changes in (effective) population size, boom-bust dynamics, and stochastic fluctuations in (effective) population size. An approach for fully agnostic demographic inference from these sample allele spectra is explored, and sufficient statistics for stepwise changes in population size are found. Further, convergence behaviours of the polymorphic sample spectra for population size changes on different time scales are examined and discussed within the context of inference of the effective population size. Joint visual assessment of the sample spectra and the temporal coefficients of the spectral decomposition of the forward diffusion process is found to be important in determining departure from equilibrium. Stochastic changes in (effective) population size are shown to shape sample spectra particularly strongly.Peer reviewe

    The Expected Sample Allele Frequencies from Populations of Changing Size via Orthogonal Polynomials

    Full text link
    In this article, discrete and stochastic changes in (effective) population size are incorporated into the spectral representation of a biallelic diffusion process for drift and small mutation rates. A forward algorithm inspired by Hidden-Markov-Model (HMM) literature is used to compute exact sample allele frequency spectra for three demographic scenarios: single changes in (effective) population size, boom-bust dynamics, and stochastic fluctuations in (effective) population size. An approach for fully agnostic demographic inference from these sample allele spectra is explored, and sufficient statistics for step-wise changes in population size are found. Further, convergence behaviours of the polymorphic sample spectra for population size changes on different time scales are examined and discussed within the context of inference of the effective population size. Joint visual assessment of the sample spectra and the temporal coefficients of the spectral decomposition of the forward diffusion process is found to be important in determining departure from equilibrium. Stochastic changes in (effective) population size are shown to shape sample spectra particularly strongly

    Inference of genomic landscapes using ordered Hidden Markov Models with emission densities (oHMMed)

    Get PDF
    CV and BY were supported by the the Austrian Science Fund (FWF; DK W1225-B20); MK and HK were supported by the the Austrian Science Fund (FWF; SFB F6101 and F6106). This work was also partially funded by the Vienna Science and Technology Fund (WWTF) (10.47379/MA16061 to CK). LCM’s research was funded by the School of Biology at the University of StAndrews.Genomes are inherently inhomogeneous, with features such as base composition, recombination, gene density, and gene expression varying along chromosomes. Evolutionary, biological, and biomedical analyses aim to quantify this variation, account for it during inference procedures, and ultimately determine the causal processes behind it. Since sequential observations along chromosomes are not independent, it is unsurprising that autocorrelation patterns have been observed e.g., in human base composition. In this article, we develop a class of Hidden Markov Models (HMMs) called oHMMed (ordered HMM with emission densities, the corresponding R package of the same name is available on CRAN): They identify the number of comparably homogeneous regions within autocorrelated observed sequences. These are modelled as discrete hidden states; the observed data points are realisations of continuous probability distributions with state-specific means that enable ordering of these distributions. The observed sequence is labelled according to the hidden states, permitting only neighbouring states that are also neighbours within the ordering of their associated distributions. The parameters that characterise these state-specific distributions are inferred.Peer reviewe

    Modelling and inferring neutral and non-neutral forces shaping genomic site frequencies

    No full text
    Single nucleotide polymorphisms in samples of DNA sequences from one or multiple populations can be summarised as site frequency spectra. Since polymorphic sites are known to be predominantly biallelic, models for the evolution of allele frequencies that assume low scaled mutation rates are justified. The biallelic boundary-mutation Moran model with reversible mutations (BMM) arises as an approximation to the classic Moran model under this consideration, and it underpins this PhD thesis. In the introduction, the BMM is presented as a mathematically tractable model that is e cient in its use of site frequency data for inferring mutation and selection parameters. Chapter 2 of this thesis extends the BMM to include balancing selection, in addition to biased mutations and a directional component (e.g., directional selection or biased gene conversion). In Chapter 3, discrete and stochastic demographic changes are incorporated into the spectral representation of the neutral BMM. A Hidden Markov Model inspired approach is used to simulate sample spectra under di↵erent scenarios, and propose a new inference method. A novel class of Hidden Markov Models with ordered hidden states and emission densities (oHMMed) is introduced in Chapter 4 alongside the source code of a corresponding R-package. In Chapter 5, oHMMed is used to annotate the genome of orangutans according to average levels of GC content and recombination rates. Site frequency spectra of similar regions are subjected to Markov Chain Monte Carlo analyses based on the BMM, and to demographic inference per Chapter 3. They are further characterised by structural genomic features. Overall, this provides a quantification of how biased gene conversion and recombination shape the background variation in hominid site frequency data. Utilised conjointly, the methods developed in this thesis could help inform an extended null model of evolution, and improve genome scans

    A nearly-neutral biallelic Moran model with biased mutation and linear and quadratic selection

    Get PDF
    CV’s research is supported by the Austrian Science Fund (FWF): DK W1225-B20; LCM’s by the School of Biology at the University of St.Andrews and has been partially funded through Vienna Science and Technology Fund (WWTF), Austria [MA016-061].In this article, a biallelic reversible mutation model with linear and quadratic selection is analysed. The approach reconnects to one proposed by Kimura (1979), who starts from a diffusion model and derives its equilibrium distribution up to a constant. We use a boundary-mutation Moran model, which approximates a general mutation model for small effective mutation rates, and derive its equilibrium distribution for polymorphic and monomorphic variants in small to moderately sized populations. Using this model, we show that biased mutation rates and linear selection alone can cause patterns of polymorphism within and substitution rates between populations that are usually ascribed to balancing or overdominant selection. We illustrate this using a data set of short introns and fourfold degenerate sites from Drosophila simulans and Drosophila melanogaster.Publisher PDFPeer reviewe
    corecore